than three of you. I have given this talk in a room of three people. So hopefully... Did they understand it? No, they had a lot of trouble. But we got to work back and forth and figure it out. Anyway, so you're here to learn Elastic Search. I've got 50 minutes, I believe. So we'll see if I can fit an hour-long talk within 50 minutes. Hi, my name is John Berryman. Just to do a quick introduction to myself, you know, find me on Twitter. A lot of my self-worth is derived from my Twitter followers. I was a... So growing up, I was a pretty nerdy kid that started reading programming manuals when I was in like the first grade. Ended up getting into aerospace engineering. That was my first career. Decided that satellites and all that stuff were pretty cool, but I liked the programming and I liked the math. I moved after about four years in the field. I got into search technology and I was consultant. I wrote a book. That guy right there. Wouldn't necessarily recommend doing that with your life, but it's a good calling card. And now I work at Eventbrite. I am a discovery engineer. So search and recommendations and stuff like that. So to give you a little preview of what we're going to talk about, this is not really an advertisement for Elastic Search, but a lot of what we're doing involves my mental model thinking through the Eventbrite problem. So just to give you a little shared background, historically the company I work at, Eventbrite, has been a very organizer-focused startup. We allow organizers who want to put on their own events to come to our website. You can build a nice little webpage with little effort to get it all suspended. You can sell tickets. We take care of all the credit card mess. You have a platform for messaging attendees and we get their metrics. So after the event is done, you get to look back and make sure that your next events are as good or better than this event. But after years of actually nailing down this side of the market pretty well, my company realized that, look, we've got all this inventory. Everyone, we're basically white-labeled, but everyone is plastering their events on our website. If we can turn around and sell our inventory to everyone else, then organizers are happy, the customers are happy because you can find something to do over the weekend, and we're hoping to generate the so-called flywheel effect. This is exciting for me because this is where I belong. Creating the marketplace is all about building search and browsing and recommendation features for Eventbrite. Of course, this technology is based on Elasticsearch, what we're talking about today. Can you guys keep it secret? So I know we're supposed to talk about Elasticsearch today, but I've got to tell you, I'm actually more interested in talking about my new startup, Eventbrite. Yep, so don't tell anybody, but I'm going to directly start competing against Eventbrite. Our guiding principles, and I'm sorry to do this to you, I know we're supposed to talk about Elasticsearch, but Elasticsearch is hard. So I'm going to focus this new startup, and you guys can join me if you'd like. It's going to be focused on MySQL because everyone knows how databases work. Databases are easy. Let's just build on a tried and true platform, and let's not overthink it. Our specialty, because I found a free data thing online, is cat-related events. We started cat-related events. Good. See, we have some attendees already. Then we'll expand to other fields. All right. We have someone who will at least buy our tickets, so we have a marketplace. Excellent. So building this new website is going to be pretty easy. There's not really too much to an event. So here's our schema with MySQL. We're going to have IDs, an integer, name, description, city, start, date. You can look at all that. That makes pretty good, simple sense. My hypothesis that I know will play out well is that we can build a website based on this. I'll demonstrate it. Here's our event search. Select star from events. That gives us all the details we'll need back for the website. We have date range search. Obviously, you'll need that to find something this weekend. We have geo search. Not hard. Why invest in all that stuff? We can just do string matching. Finally, it's easy to search for events that you like. I want to find an event where the name equals cat. The results are nothing. Oh. So this is interactive part. Why do you think there might not be any results for that particular MySQL query? Yeah. Okay. So that's a little problem. I could spell cat with misspellings. These are all overloading my brain. I think we can still make this work out. You guys can't spoil all my slides before I get to them. All right. The particular problem here is probably no one's going to, to the first answer, no one's probably going to name their event cat. Would you like to come to see cat? MySQL solves this for us. We can use a like query. Like percent cat percent. The results come back as teach your cat to knit an evening of cat bowling and BYOC cat dance party. We're on board. That was just a silly thing just to show you that we can probably accomplish this. Let's get more serious with a more serious query. Someone's likely to be looking for a cat farming seminar. So we're going to help them. What? Not in a bad way. That might have particular meaning to you that it doesn't to most of my audiences. Not that event. So anyways, so how do we look, how do we search for this? If someone comes to our website and they look for cat farming seminar, select star from events for name like percent cat farming seminar. Yes. Well, it's in red, which that's the thing I would like to match, but it doesn't match. Interactive time. What have I done wrong now? Case. That's right. MySQL is all uppity about case. So this is also not hard. All we have to do is whatever the people type to us, we lowercase it and it'll still work. Cat farming seminar. So okay, great. That matches. But seminar for farming of cats, not such a match. Anyone have any ideas how I can deal with this one? Cats or farming. Well, let's try and. I want to make sure. Yeah, so okay. So let's do something like this. Good idea. Good idea. And well, it's starting to itch me a little bit because I heard that like is not as efficient of a query is just like a pure match. But surely not, right? And we're doing it three times. So it's kind of like scanning every document in the database three times, right? But we'll probably shard it and that scale will be fine, I'm sure. So anyways, we do indeed match that seminar for farming of cats. But we don't yet match making a cat farm the seminar. And now you're totally in my head because I didn't realize that this was a potentially derogatory thing. Making a cat farm the seminar. So why does that one not match? Farming and. Well, they're the same thing, right? Yeah, so. So it's like I with search technologies, they do a pretty good job about understanding language and I guess we'll have to like cut off the ends of the words. So farming farm at least that'll match farmer farms that'll match other stuff. And we do indeed do get back the results we want. I'm trying to poke some holes in my little theory here though. This is an old presentation. Are you telling me I should retire my presentation after this time? Oh yes, you're right. Okay, so I should have updated the dates on my examples on my slide for Mr. Michael Handlin in the front of me. So next one, cat farm class. Doesn't match either. It's a class. It's kind of like a little mini seminar. In order to make that work, I'm going to have to do, what am I going to have to do for that one? Oh, okay, okay. It doesn't match all the terms. But at least if it matches like a couple of them, that should be good enough, right? So I replaced my ands with ors to someone's suggestion earlier. And what happens? I do indeed match everything I want. And I match all these things I don't want. And since there's no notion of which match is better than the other match, all the stuff with a cat event goes to the top and this is the whole thing about cat events. So guys, I think we're sunk. I apologize for taking you through this startup with me. But databases are very good at some things. But search engines and search technology are very good at a different set of things. In particular, search engines are quite good at finding documents that not only just match exactly what you have but contain specific tokens and phrases of the tokens and different mutations of the tokens. They understand English in a way that I think you'll understand when you leave here. Scoring and sorting of documents. MySQL finds the set that matches, whereas Elasticsearch, as we'll see in a little bit, you can put into it an understanding of how good or bad a match is to particular search terms. And finally, this is something that both MySQL and Elasticsearch are good at. But it's become an interesting, more recent use case with search technologies. Searches are actually really good for filtering, grouping, and aggregating data. So search engines came out of information retrieval field, but they're being used more and more for log analytics and stuff like that. And we'll touch on that right at the end. Alright, so now since we've failed, let's go ahead and get back to the main talk that you guys came here for. We're going to teach you about Elasticsearch, and in the next 30 minutes, we'll do a really quick and dirty application. I'll show you how to pull down Elasticsearch, create an index, index stuff, and retrieve it. We'll take a peek under the hood so that you can see the data structures and algorithms in place. Fortunately, the data model for Elasticsearch is simple enough that you can leave with a basic understanding of it. And we'll get, as I promised, we'll get into some of the data aggregation stuff that Elasticsearch has been used more recently for, and then we'll have hopefully a little time for questions. What in particular I want you guys to get out of this is a couple of meta goals. One, I want you to see me using the very basic implementation of Elasticsearch, and I want it to be approachable for you guys. So it's a tool on your shelf that you can grab for and learn more about when you need it. The second thing, and I encourage you to do this with any technology, any data store technology that you want to use, I want to impart an intuition about how these data structures work and what they're good at and a little bit about what they're not good at. This means that when you reach the shelf to get your tool, you actually get the right tool. So building a basic search app is not that hard. And you can get, I'd say for, there's a lot of tuning that comes with Elasticsearch and getting the behavior and the notion of relevance just right, but getting the thing out of the box and turning it on, it'll actually get you about 50% of the way there. So it's a real quick technology to get up and running and get some good results. In order to install and run Elasticsearch, this is pretty easy. You all probably know what WGET is. So you can pull down the, find your favorite mirror, pull down Elasticsearch. In this case, I do need to update my notes here. It's a little bit older version of Elasticsearch. But pull down, unzip it to wherever you want it to live, cd into that directory and then start the binary bin slash Elasticsearch. Once you do that, you can just curl localhost at the Elasticsearch port 9200 and it tells you, hey, you know, for search. Like, in case you forgot that it was for search. But Elasticsearch is now up and running. And just like with MySQL, with Elasticsearch, you will want to think in advance about the type of data that you're going to be interacting with and build a schema for it or, as they say in Elasticsearch, a mapping. Now, Elasticsearch is interesting here because early on they advertise that they were a schema-less data store in the age where MongoDB was rocketing off. Everyone was kind of tacking onto this. And it was true to an extent that you could just start dumping information into Elasticsearch. And that's gained Elasticsearch a lot of popularity, but it's still kind of an anti-pattern. So it, in my opinion, over years using this technology, it's still very important to think through what you're getting ready to do with this thing. So setting up the mapping is simple. Everything in Elasticsearch is a JSON interface. And in this particular, this is a Python conference, so every example that you'll see here, I am using the Python client. But it's really nice. It's really a fairly thin layer over the JSON interface of Elasticsearch. So when you're setting up a schema, all you have to do is specify the fields that you're going to have, in this case, ID, name, description, city, start date, price. And you get all of the things that you would typically think of existing in a data store. So you have numbers, integers, floats, strings, dates. It's actually, so you can start to get more complex things like dates. You can get locations that are a little bit more aware than just two numbers. It knows what a location is. But one thing I'll be focusing on is not only can you have strings, but you can say that your strings are special in some way. For example, an ID is a type of string, but it is a string that is not analyzed. That means that we're not going to do any special massaging and trying to understand this as a string from natural language. However, both the name and the description here, I've marked as having an analyzer that is English. So this is me giving Elasticsearch a hint that not only is this blob of bytes actually text, but it's text of English. And I'll show you what that means to Elasticsearch in a little bit. But it's interesting because you don't have to put English here. You can put Chinese or Japanese or any language, most any language that you'd want. And you can make up your own stuff. So there's interesting things that you can, extra rules you can put in for like if you have camel case strings because you're indexing programming languages, you can break that up and make your own analysis chain for it. And then of course, here's me using the client. You create Eventbrite with that mapping structure. Okay, so we have an index set up ready to receive events. Actually adding the events at that point is pretty simple. You have an array of events and it's just JSON blobs again. The client is nice because you can use date times and it does the right thing. And then the simplest version is for just an iterator for every doc that you have, then dump it into Elasticsearch. This does make an HTTP request for every doc so there are batch methods once you actually really want to put this into production. That's an easy way to get up and running. Okay, so now we've got a bunch of documents in the index. The next bit is to pull stuff out of it. And the easiest way to explain this, oh yeah, sorry for the microscopic text, how horrible is that to the people in the back? I'll just speak louder. So the simplest building block for pulling stuff back is this match all query. And it does exactly what you think. It's effectively the select star from the events table. It gets everything back in the order that you indexed it in. And you don't have to understand what is on the screen here. But I'll provide these notes on my Twitter account later. You can see it. But it gives you back what you'd expect. It tells you how much time the query took. It tells you if there's any errors. And obviously, importantly, it gives you all the hits back. All the documents that match the query sorted by how well they match the query. In the case of match all, there's no notion of relevance so you just get them back in the order that you indexed them. Alright so that was the hallow world of making a query. But there's a lot of different things you can do to craft the notion of relevance. What is an important document? What should match? What should not? And the building, the smallest building block for these is the so-called term query. So if we have an index document, it's in an event in Nashville. If I wanted to make a filter over all the documents and only hit documents corresponding to the city Nashville, then that's a term query. I say this is a term, the field is city, the token is Nashville. The special thing about a term query is, just like earlier where I said not analyzed, term means that this is just a token. It has to be capital N-A-S-H-V-I-L-L-E. It doesn't do anything special. And so that's a match. But where it gets interesting and where you really get a benefit from a search engine is when you start incorporating this notion of hey this is not just a string, this is actually English text. And so if we have a sort of stupid document here, name equals filbert sorting for fun and profit, then a query that is not of type tech term but of type match actually applies that special knowledge about this is English. And so rather than looking for sort filbert the exact tokens there, it knows that it can be lower case, we can split on spaces. Sorting and sort should be basically the same information and so that's a match. So compared to what you think about how you'd have to do that in MySQL, you would have to make a horrendous query to make that one simple match right there and it would also be very poor performing for reasons that I'll get into in a little bit. Getting more and more complicated because your application has to have a lot of different ideas mixed together, you can do phrase matching. So not only do we have the notion of matching documents that have these terms but we want a document that has the term sorting and filbert in it in that order. This is not a match because the original document had filbert sorting. However, if we search for filbert's space sort that is a match despite the fact that it's different from the original document. Original document has upper case and has different parts of speech. But think about as a user looking for something, you don't quite remember the name of the movie but you're probably going to get something like this. So getting these type of fuzzy matches is a specialty of search technology. Filbert fun won't match because there's space between filbert and fun, just more example of how match phrase works. But you can add this notion of slop and everyone chuckles when I do that one. That's what it's called. You can add slop and it'll find any document that has these two words within a space of two. You can go nuts with this. I once had a gig with a US patent office and their search technology that they were getting rid of and moving to a different search than the elastic search, solar, they really wanted to know I want to find this word within the same sentence as some other word and I want to find it before or within some number of words. So you can take this same behavior and overload it and get some really complex search behavior. But everything I've showed you to this point is just atomic. It's like I want this thing or that thing. You have to have a way of gluing these things together. In elastic search that is a Boolean query. In normal notions of Boolean queries, in normal notions of Boolean you think ands and ors and nots. Elastic search has that but using different terminology. Rather than ands we say must rather than should or we say should and then not is must not. So that one makes pretty good sense. But the idea and if you play around with a few queries you see why they moved to this terminology. Usually you have an array of things that must match. So in your elastic search query you have a must key and so you stick everything that must, all these sub clauses that must match there. And additionally you have several things that don't have to match but should match. If they could match, if you could find documents that also happen to have these other things it should boost a little bit higher. So that's yet another array of things that if it matches then you get a better score. Each one of these pieces you have the ability to also adjust weights. So we're starting to get into a notion of how search understands what's important to your customers and to your business. You can not only match documents that match the queries but you can also boost documents that we need to sell quickly because they need their expiring inventory or something like that. And that leads us to our next big topic, search relevance. I'm curious how many people here have heard of the notion of TF-IDF? Okay only this half of the room that's interesting. You guys should have mixed in a little bit more. It's not a hard concept and so I think it's intimidating at first but I can break it down pretty easily. This will be a little bit of a math-y slide but not too bad. First off TF is really just means term frequency and I'll get into that. And IDF means inverse document frequency. And the best way rather than giving you the Webster's definition, the best way of explaining this is through an example. And let's say a user comes to your website and makes a search for the diddle. Now that seems odd until you realize that one of the matching documents in your index is Hey Diddle Diddle the Cat in the Fill. That's actually a pretty good match for it. So let's do a little practice round and see what this document would be scored as from the search engine's perspective. Term frequency is simply the number of times a term occurs in a document. So the TF or V in this case is 2. The occurrence of V is twice. Similarly, just by coincidence, diddle also occurs twice. So TF for both of those guys is 2. So far so good? Inverse document frequency, sometimes I just wish they called it document frequency and just put a 1 over it. Basically, how many number of times the document frequency is how many number of times the term occurs not in this document but across the entire set of documents. So document frequency for the pretty high. So the inverse document for the is just about 0. Makes sense? And the document frequency for diddle, not a very common word, is about, it only occurs in 7 documents. So it's actually very important and it gets an inverse document score of 1 over 7 which is a lot, lot, lot higher than 0. So when you finally are figuring out the total score of this document against this query, you put all those pieces together. The score is the TF IDF score for V plus the TF IDF score for diddle. And you probably make sense but just be a little bit redundant. TF of V is 2, IDF of V is 0, goes away. TF of diddle is 1 seventh, or is 2 and IDF is 2 seventh and so you get the final result of 0.2857 blah, blah, blah. But the idea is every document is going to go through the same process and be sorted and so the way that you craft your query informs the way that this math works in the documents that you have 10,000 matches but you want to make sure you do the right thing so the top 10 search results are what they want. Okay, so that was a pretty overloading slide. I always like to take a break after heavy slides like that and I think play work is really therapeutic and in particular I think that this, this is my favorite one. Ah, that's great. We're going to watch that one more time. I love this part of the talk. Okay, service is good break. So to this point, how much time have I got left by the way? So at this point we've done a lot to get you in the mind space of how search works from a mechanical perspective, how to dump stuff in, how to pull stuff out, what it can do as compared to other data stores like MySQL that I was picking on. The next thing that we want to do is dive inside the data store and give you a little of intuition about how the pieces inside work and what you'll find is not that complicated. So after this section you'll have a little better understanding about when it's right to use Elasticsearch and when it's not. So getting data in, in any data store there's two main chunks that you have to understand. How you get data in and how you get data out. So that's the outline for the next bit. The first step of getting data into Elasticsearch is a step called analysis. And basically we're going to take a document and in this case I've got just one field out of a document and I will show you how it effectively gets shredded and rearranged and shoved into the data structures that make search technology so fast. Our example in this case is the sentence, the conspirators conspire conspicuously. I chose it so that I could almost not pronounce it at a conference. Tokenization, that's the first step. In this case we have told Elasticsearch, hey this is English and that gives us some interesting things that we can play off of. We know that English is split on white space and also punctuation. We can basically throw out punctuation. An interesting side note that I always like to make here is this is not true of a lot of languages on the other half of the earth. So like my wife is Japanese and so there are places where you can have symbols right next to each other and they're different words and doing the same thing in Japanese, which you still have to do, you have to have a really complex algorithm to know where the best place is to split these things to make a logical sentence. So tokenization itself is a fairly deep topic. Next step is actually a fairly shallow topic. Lower casing, pretty easy but if you have someone type in lower case you better make sure that it matches a document that has upper case letters. Stop wording, a lot of the words in English are just noise words. They help us understand where things are placed relative to each other but they don't really change the content. So we can throw away words like the and is and was and stuff like that. Perhaps my favorite step of analysis is stemming. This is another place where because we've given Elasticsearch the hint that this is English, it knows some interesting tricks to do. If you want a document for farming to match a query for farms, which is often the case, then effectively what stemming does accomplishes that. You can take a word and using a statistical technique you can effectively chop off and sometimes modify the end of these words to make tokens that are easier to match no matter what the intent was of the people searching. Alright, next step after analysis is indexing. So our example sentence has turned into these three tokens, conspir, conspir, conspicu. Sounds like Latin. Let's say that this is document one. The secret sauce of Elasticsearch for being so fast is effectively during the indexing process it takes these sentences, turns it into a bunch of tokens and then it effectively transposes that. So instead of document one has these tokens, at the end of the analysis when you've gone through all of your documents you say these tokens have these documents. So document one had these tokens but in the end conspir appeared in document one as well as these two other documents. Conspicu appeared in document one as well as these three other documents. So effectively from a Python point of view you could implement this with a dictionary where the keys are tokens and the values are an array of IDs. Now under the hood this is actually implemented in Java and they do a lot of sneaky stuff. They shim extra information in the keys so all the notions of document frequency which we use for scoring gets shoved over into the keys when you look stuff up and all the notion of term frequency, that's the other half of the TFIDF, are basically hidden into the values on the right as well as other information like the positions of the words in the documents so you can do phrase matches and stuff like that. But effectively a simple search engine is just a Python dictionary like that. Alright so we have now gotten all the information into the index. The next half of the equation is getting information out of the index. So our inverted index looks like this and given that data structure what's the easiest way to find all documents that contain conspicuous and aardvarks? Anyone? Yep that's all you have to do. Effectively you have, these are lists but they might as well be sets or iterators and you find whichever one IDs occur in both. And you can build arbitrarily complex things on the same idea or just a set union and if you combine a more complicated search it's a set union followed by a different set or set intersection. Pretty easy. So but that's only half the puzzle because MySQL is really good at finding documents that match. I just showed you how Elasticsearch finds documents that match efficiently but Elasticsearch has to turn around and do a sorting algorithm that is part of the important aspect of search. When Google gives you back the 60,000 results it supposedly says you have for your query you only see the top 10 and they're usually pretty good. If you scroll down 50,000 pages they would probably be less good. So it's important to know how that works. Effectively what happens is when your user gives you a query you have an iterator of all the documents that match. And so what you do to find the top 10 is you initialize, you have a priority queue. Do you all know roughly what a priority queue is? We can talk about that. But effectively what you do is every document that comes through you take it off of that iterator, you look at all the other secret stuff we've hidden in there and find the score for that document and now you put the document and that score on your priority queue and there's something there that's just iterating doing that with every single match that exists. The interesting aspect of this priority queue though is that it doesn't keep up with every document it ever sees. It's only of length 10 or whatever you tell it to be. So as soon as you get past the top 10 documents you've got one that scores lower than the documents then it compares itself to not even 10 like log or log in or whatever. It compares it to a few of the documents and says I'm lower than all of these, never think of me again. And so the action is actually pretty efficient. Now there's a little side note, this is another intuition that might be important for Elasticsearch. If you're doing some sort of relevance but you also want to return 100% of the documents, think about how you'd implement that. If I want to deep paging is what this is called. If you've got a robot scanning your website for the 10,000th to 10,000th and 10th most fun event, then this means that you have to have a priority queue that is 10,000 and 10 long and you sort all the documents in, throw away the first 10,000 of them and give that chunk back. And guess what happens when the robot goes to the next page carelessly, it just gets worse and worse and worse. So that's one important intuition to think about search technology. Elasticsearch allows you to turn that off if you don't care about relevance. But if you do, I would recommend not letting anyone get past about 500 results. Alright and then Ari said it returns the most high priority contents from that queue. That's effectively what we do. Like after top 10 they go away. The data structure is only 10 items long so it can't hold anymore than that. Oh yeah, yeah, yeah. That's not a bad idea. I don't know how I would implement that in Elasticsearch. I don't think they make that easy for you. But yeah, that totally checks out. Alright. Okay, so I need a little transition slide here. But effectively that gets us through everything that a search engine has been until about three years ago. But Elasticsearch came out of information retrieval, library technology type stuff, finding whatever I wanted to find. But Elasticsearch has started to prove the point really strongly that the same data structures that serve search results are actually really good for online analysis, log parsing, stuff like that. And a big chunk of that is its ability to do aggregations. And I think I can convince you that it's basically what we were doing before, just one extra step and you get this nice ability to do aggregations for free almost. So just like before, whenever we're aggregating over the, you know, we want to find the histogram of the ticket prices or something like that. We have all the results that we had from before. We do the sorting like we did with them before. But while we still get that document in hand, we push it through an aggregator. It's basically just a little in memory thing that says, okay, how many documents have I seen in, you know, from $10 to $20? And it just increments those counters. For every document, it does this. And at the end of it, you pass back this aggregator thing, and you have these really nice results. And it was just something that you did almost as a byproduct of the actual search itself. So with the building blocks that I've given you right now, you can see how we have the ability to easily filter, just what a search is. You can group stuff because you can see as the documents are coming through, you can already figure out which group it belongs to. And within each group, you can do calculations to do running averages or anything like that. So to give you a little more intuition about how you might use aggregation, here is how I encountered it for the first time. Let's say you go to Amazon, you're chuckling. Have you seen my, that top book, by the way, is a really excellent book. So anyway, if you go to e-commerce sites, you see a lot of the original use for aggregations. They were called facets, faceted search. You have a list of subcategories on the side. You have the counts for how many things are in that category. You can click on it, and it serves as a filter. It gives you a little bit of what I call relevance feedback, so you can understand what's actually happening. But people have taken the same data structure, you turn it on its side, and you've got really nice histograms, which at Eventbrite, we're making them prettier now, but you can use them to feedback good information about how many tickets were sold from a particular class. You can take exactly the same information, but a different data set, and give spark charts for how many tickets were sold in a particular day. And you can take, again, counts over buckets, and you can plot it on a map, and you've got a really nice geo information console to give you intuition about where things are happening in geospatial relationship. And finally, I don't know exactly how to make a picture for it, but log analytics in particular are great with Elasticsearch. Building Elasticsearch, building aggregations in Elasticsearch is easy. I'm going to fly through this, so I have a couple questions. But effectively, all you have to do is you have your normal query, you keep asking your query like normal, but you add a new section to your query to Elasticsearch called ags. And in this particular case, it's going to be hard to read, so I'll blur over it. But you can say things like, my aggregations, I want you to do counts grouped by city. So that's a term aggregation where the field is city. And I also want you to do a histogram aggregation for the prices with an interval of 10. So that's the second thing. The results come back, and you have the normal search results at the very top, but you have a new section that has these aggregations in it. In this case, I've got the city bucket right there with my Nashville and Dallas and BFE events, and I've got my price buckets for what distribution of events occurred. But a neat thing that you can do, so right now, I really needed a graphic for this, right now I've got two separate aggregations. A neat thing for us to provide back to our users is not only the histogram of all the events, but we could do a histogram per each city. And you can do this with elastic search. Aggregations can be arbitrarily nested. There's performance issues after some point. But I can say, at the top level, do a terms aggregation, so we bucket everything by city, and I get the counts back. And then within that aggregation, do a histogram so that we can show our users, here's the price distribution within the city that you're interested in. The results turn, come back. Very similar structure, except if it's appropriately nested, so that for each city bucket, you have the count. And within that, you have sub buckets for the histogram, so that you can draw it on the screen. That's effectively it. I've been doing this a while, so I have a lot of things to learn, but a lot of other things that I would enjoy talking about. Also, if you're interested in learning more on your own, I know of some reading material. And, you know, find me on Twitter. Tell me what I did right and what I did wrong. Anyway, that's it. What have you guys got? Any questions? So, repeating questions, I guess, right? The question was around how do we deal, we can specify English or not, but how do we deal with unknown terms, different languages, jargon terms, stuff like that. The easy answer is you still just say it's English if it's basically English. And you still get the ability to split on white space and all that stuff, because that's presumably where you might come from. I'll go to the other extreme in a second. And you still do stimming, which means if it's like maybe a verb, but it's a verb I haven't heard before, stimming actually does pretty well for English-like things. I mean, but if you're willing to put the work in it, you have an arbitrary amount of control over what you can do. So, at the other extreme end of things, I mean, I guess you could write your own Java. It's all pluggable. It's just Lucene, Java Lucene. You could write your own classes to do whatever custom logic you want. If you don't want to go quite that far, there are other kind of middle ground things like synonyms. You can say, you know, it's a preprocessing step before you do the stimming and chop off and throw away the ends of words. You can say, here's a file of every jargon word you might see. And you can either say, don't touch it for the downstream stuff. Or you can say, you know, this maps to three other words. Or these three words map to one word. So, there's a lot of flexibility about what you can do to tune that relevance notion. But it might be a lot of work. He had a question first. You, yes and no. Part of that is that not only do we hide the term frequency that counts for each one of those terms that they occurred in the documents, but we also hide a few other small things that we stick next to the tokens. We hide its position in the document, which gets to your answer about phrases. And you can also hide, there's a couple other things that aren't used as often, but like you can hide part of speech there if you have that set up. And you can hide a payload, which you can do whatever you want to. You can boost on documents that have certain words in it a little bit higher. But it's still there. One thing you can't do though is make a search and reassemble it into the original document from this data structure. That's why whenever you store a document in Elasticsearch, it gets shredded, turned into that, and at the same time you have a different file on disk that's pulled into memory that reads the original document out. So, you're effectively storing it twice every time. A document at Eventbrite is an event. And it has what I call the boring field that are expected. The name, description, the date, geolocation, which actually gets interesting. But we also have, this is in progress, but we're working on interesting fields like machine learning things like event cluster that we can later match up with a user cluster that comes in. Or event quality, which is another thing that we're inferring from the metadata around it. So those are all things that Elasticsearch is happy with dealing with. And then there's not too much more than that that's like mind blowing from departure from what I showed here. It's Elasticsearch. This is a data structure. It's a JSON record that's, we do exactly that thing with it. Hand it to Elasticsearch and make it variable. Elasticsearch stores both. And that is like that secret ID that's collected from that data structure. Hand us a pretty HDI one. So let's say we basically, I've got a table here, so we're taking it to an ahead of it. ystall, and join a favorite data package. And then that will be a custom demo. How complex is it to re-emphasize the changes? Really cool question. A really cool benefit of Elasticsearch is it's a write-only index. So segments on disk effectively are never touched again. But the caveat is when you actually change a field, what you do is you go back, find that record where it used to be written, read out the entire document, change that one field, and write it to a new segment file. And the only place you can change the old file is you mark one bit is dead, tombstone it. So not great, but it's a trade-off. You get benefits for treating it that way. Definitely not a table scan. It's still pretty quick. Cool, so I have exactly zero minutes left. Please come back, talk to me later. And thank you very much for coming.